GAPWM: a genetic algorithm method for optimizing a position weight matrix
نویسندگان
چکیده
MOTIVATION Position weight matrices (PMWs) are simple models commonly used in motif-finding algorithms to identify short functional elements, such as cis-regulatory motifs, on genes. When few experimentally verified motifs are available, estimation of the PWM may be poor. The resultant PWM may not reliably discriminate a true motif from a false one. While experimentally identifying such motifs remains time-consuming and expensive, low-resolution binding data from techniques such as ChIP-on-chip and ChIP-PET have become available. We propose a novel but simple method to improve a poorly estimated PWM using ChIP data. METHODOLOGY Starting from an existing PWM, a set of ChIP sequences, and a set of background sequences, our method, GAPWM, derives an improved PWM via a genetic algorithm that maximizes the area under the receiver operating characteristic (ROC) curve. GAPWM can easily incorporate prior information such as base conservation. We tested our method on two PMWs (Oct4/Sox2 and p53) using three recently published ChIP data sets (human Oct4, mouse Oct4 and human p53). RESULTS GAPWM substantially increased the sensitivity/specificity of a poorly estimated PWM and further improved the quality of a good PWM. Furthermore, it still functioned when the starting PWM contained a major error. The ROC performance of GAPWM compared favorably with that of MEME and others. With increasing availability of ChIP data, our method provides an alternative for obtaining high-quality PWMs for genome-wide identification of transcription factor binding sites. AVAILABILITY The C source code and all data used in this report are available at http://dir.niehs.nih.gov/dirbb/gapwm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
OPTIMAL OPERATORS OF GENETIC ALGORITHM IN OPTIMIZING SEGMENTAL PRECAST CONCRETE BRIDGES SUPERSTRUCTURE
Bridges constitute an expensive segment of construction projects; the optimization of their designs will affect their high cost. Segmental precast concrete bridges are one of the most commonly serviced bridges built for mid and long spans. Genetic algorithm is one of the most widely applied meta-heuristic algorithms due to its ability in optimizing cost. Next to providing cost optimization of t...
متن کاملOptimization of a Container Ship Dimensions Using Multi-Objective Genetic Algorithm Method
Today, marine transportation has a significant role in global trade. The characteristics of the containerized shipping have made the number of container ships grow every day and made significant improvements in the construction and operation of these ships. In this research, the main dimensions of a container ship are optimized according to different objectives. This optimization aims to reduc...
متن کاملEIGENVECTORS OF COVARIANCE MATRIX FOR OPTIMAL DESIGN OF STEEL FRAMES
In this paper, the discrete method of eigenvectors of covariance matrix has been used to weight minimization of steel frame structures. Eigenvectors of Covariance Matrix (ECM) algorithm is a robust and iterative method for solving optimization problems and is inspired by the CMA-ES method. Both of these methods use covariance matrix in the optimization process, but the covariance matrix calcula...
متن کاملOptimizing the preventive maintenance scheduling by genetic algorithm based on cost and reliability in National Iranian Drilling Company
The present research aims at predicting the required activities for preventive maintenance in terms of equipment optimal cost and reliability. The research sample includes all offshore drilling equipment of FATH 59 Derrick Site affiliated with National Iranian Drilling Company. Regarding the method, the research uses a field methodology and in terms of its objectives, it is classified as an app...
متن کاملOptimizing image steganography by combining the GA and ICA
In this study, a novel approach which uses combination of steganography and cryptography for hiding information into digital images as host media is proposed. In the process, secret data is first encrypted using the mono-alphabetic substitution cipher method and then the encrypted secret data is embedded inside an image using an algorithm which combines the random patterns based on Space Fillin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 23 10 شماره
صفحات -
تاریخ انتشار 2007